Covered
Introduction to hypothesis testing
The standard normal distribution
Standard error
Confidence intervals
Student’s t-distribution
H testing
One and Two Sample T Test
p-values
Useful hypotheses: - Rely on specifying - null hypothesis (Ho) - alternate hypothesis (Ha)
-
Together Ho and Ha encompass all possible outcomes:
For Example:
Ho: µ=0, Ha: µ ≠ 0
Ho: µ=35, Ha: µ ≠ 35
Tests assess likelihood of the null hypothesis being true
-
Hypothesis tests
-
Statistical test results:
p = 0.3 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 30 times
p = 0.03 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 3 times
Which p-value suggests Ho likely false?
Statistical test results:
At what point reject Ho?
p < 0.05 conventional “significance threshold” (α = alpha or p value)
p < 0.05 means: if Ho is true and we repeated the study 100 times - we would get this (or more extreme) result less than 5 times due to chance
Statistical test results:
α is the rate at which we will reject a true null hypothesis (Type I error rate)
Lowering α will lower likelihood of incorrectly rejecting a true null hypothesis (e.g., 0.01, 0.001)
Both Hs and α are specified BEFORE collection of data and analysis
Traditionally α=0.05 is used as a cut off for rejecting null hypothesis
There is nothing magical about 0.05 - actual p-values need to be reported - also need to decide prior to study
| p-value range | Interpretation |
|---|---|
| P > 0.10 | No evidence against Ho - data appear consistent with Ho |
| 0.05 < P < 0.10 | Weak evidence against the Ho in favor of Ha |
| 0.01 < P < 0.05 | Moderate evidence against Ho in favor of Ha |
| 0.001 < P < 0.01 | Strong evidence against Ho in favor of Ha |
| P < 0.001 | Very strong evidence against Ho in favor of Ha |
Fisher:
p-value as informal measure of discrepancy between data and Ho
“If p is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 …”
General procedure for H testing:
If p-value < α, conclude Ho is likely false and reject it
If p-value > α, conclude no evidence Ho is false and retain it
Recall…
Recall pine needle example
Probability of getting sample
with ȳ at least as far away from 21 as 35)? - p(ȳ ≤ 3500 or ȳ ≥ 3900)
What about - 1-tailed or 2-tailed test?
Can solve using SND and z-scores
z= (21-35)/40 = -0.48
But - usually can’t use z!
Can use t-distribution instead…
This activity will guide you through the process of conducting single-sample and two-sample t-tests on pine needle data. We’ll explore how environmental factors like wind exposure might affect pine needle length.
You’ll learn to:
A single sample t-test asks whether a population parameter (like \(\bar{x}\)) differs from some expected value.
The question: Is the average pine needle length from our windward sample different from 55mm?
Used when we want to compare a sample mean to a known or hypothesized population value.
where:
# Load the pine needle data
# Use here() function to specify the path
pine_data <- read_csv("data/pine_needles.csv")
# Examine the first few rows
head(pine_data)# A tibble: 6 × 6
date group n_s wind tree_no len_mm
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 3/20/25 cephalopods n lee 1 20
2 3/20/25 cephalopods n lee 1 21
3 3/20/25 cephalopods n lee 1 23
4 3/20/25 cephalopods n lee 1 25
5 3/20/25 cephalopods n lee 1 21
6 3/20/25 cephalopods n lee 1 16
Before conducting hypothesis tests, we should always explore our data to understand its characteristics.
Let’s calculate summary statistics and create visualizations.
Activity: Calculate basic summary statistics for pine needle length
# YOUR TASK: Calculate summary statistics for pine needle length
# Hint: Use summarize() function to calculate mean, sd, n, etc.
# Create a summary table for all pine needles
pine_summary <- pine_data %>%
summarize(
mean_length = mean(len_mm),
sd_length = sd(len_mm),
n = n(),
se_length = sd_length / sqrt(n)
)
print(pine_summary)# A tibble: 1 × 4
mean_length sd_length n se_length
<dbl> <dbl> <int> <dbl>
1 17.7 3.53 48 0.509
Activity: Create visualizations of pine needle length
Create a histogram and a boxplot to visualize the distribution of pine needle length values.
Effective data visualization helps us understand:
# YOUR TASK: Create a histogram of pine needle length
# Hint: Use ggplot() and geom_histogram()
# Histogram of all pine needle lengths
ggplot(pine_data, aes(x = len_mm)) +
geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
labs(title = "Distribution of Pine Needle Length",
x = "Length (mm)",
y = "Frequency") +
theme_minimal()We want to test if the mean pine needle length on the windward side differs from 55mm.
Activity: Define hypotheses and identify assumptions
H₀: μ = 55 (The mean pine needle length on windward side is 55mm) H₁: μ ≠ 55 (The mean pine needle length on windward side is not 55mm)
Before conducting our t-test, we need to verify that our data meets the necessary assumptions.
Activity: Test the normality assumption
Methods to test normality:
Visual methods:
QQ plots or histograms
Statistical tests: Shapiro
Wilk test
# Filter for just windward side needles
windward_data <- pine_data %>%
filter(wind == "wind")
# YOUR TASK: Test normality of windward pine needle lengths
# QQ Plot
qqPlot(windward_data$len_mm,
main = "QQ Plot for Windward Pine Needles",
ylab = "Sample Quantiles")[1] 21 22
Shapiro-Wilk normality test
data: windward_data$len_mm
W = 0.96062, p-value = 0.451
Now that we’ve checked our assumptions, we can perform the single sample t-test.
Activity: Conduct a single sample t-test to compare windward needle length to 55mm What is probability of getting sample at least as far from 55mm as our sample mean?
This is our p-value, which helps us decide whether to reject the null hypothesis.
# Calculate summary statistics for windward needles
windward_summary <- windward_data %>%
summarize(
mean_length = mean(len_mm),
sd_length = sd(len_mm),
n = n(),
se_length = sd_length / sqrt(n)
)
print(windward_summary)# A tibble: 1 × 4
mean_length sd_length n se_length
<dbl> <dbl> <int> <dbl>
1 14.9 1.91 24 0.390
# YOUR TASK: Conduct a single sample t-test
t_test_result <- t.test(windward_data$len_mm, mu = 55, var.equal = TRUE )
print(t_test_result)
One Sample t-test
data: windward_data$len_mm
t = -102.85, df = 23, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 55
95 percent confidence interval:
14.11050 15.72284
sample estimates:
mean of x
14.91667
Activity: Interpret the t-test results
How to report this result in a scientific paper:
“A two-tailed, one-sample t-test at α=0.05 showed that the mean pine needle length on the windward side (… mm, SD = …) [was/was not] significantly different from the expected 55 mm, t(…) = …, p = …”
Now, let’s compare pine needle lengths between windward and leeward sides of trees.
Question: Is there a significant difference in needle length between the windward and leeward sides?
This requires a two-sample t-test.
Two-sample t-test compares means from two independent groups.
where:
Activity: Calculate summary statistics grouped by wind exposure Before conducting the test, we need to understand the data for each group.
# YOUR TASK: Calculate summary statistics by wind exposure
# Hint: Use group_by() and summarize()
group_summary <- pine_data %>%
group_by(wind) %>%
summarize(
mean_length = mean(len_mm),
sd_length = sd(len_mm),
n = n(),
se_length = sd_length / sqrt(n)
)
print(group_summary)# A tibble: 2 × 5
wind mean_length sd_length n se_length
<chr> <dbl> <dbl> <int> <dbl>
1 lee 20.4 2.45 24 0.500
2 wind 14.9 1.91 24 0.390
Activity: Create visualizations to compare the groups Effective visualizations for group comparisons:
# YOUR TASK: Create a plot using stat_summary to show means and standard errors
ggplot(pine_data, aes(x = wind, y = len_mm, color = wind)) +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(title = "Mean Pine Needle Length by Wind Exposure",
x = "Wind Exposure",
y = "Mean Length (mm)") +
theme_minimal()Activity: Test assumptions for two-sample t-test
For a two-sample t-test, we need to check:
If assumptions are violated:
# YOUR TASK: Test normality of windward pine needle lengths
# QQ Plot
qqPlot(pine_data$len_mm,
main = "QQ Plot for Windward Pine Needles",
ylab = "Sample Quantiles")[1] 4 28
# Testing normality for each group
# Leeward group
lee_data <- pine_data %>% filter(wind == "lee")
shapiro_lee <- shapiro.test(lee_data$len_mm)
print("Shapiro-Wilk test for leeward data:")[1] "Shapiro-Wilk test for leeward data:"
Shapiro-Wilk normality test
data: lee_data$len_mm
W = 0.95477, p-value = 0.3425
# there are always two ways
# Test for normality using Shapiro-Wilk test for each wind group
# All in one pipeline using tidyverse approach
normality_results <- pine_data %>%
group_by(wind) %>%
summarize(
shapiro_stat = shapiro.test(len_mm)$statistic,
shapiro_p_value = shapiro.test(len_mm)$p.value,
normal_distribution = if_else(shapiro_p_value > 0.05, "Normal", "Non-normal")
)
# Print the results
print(normality_results)# A tibble: 2 × 4
wind shapiro_stat shapiro_p_value normal_distribution
<chr> <dbl> <dbl> <chr>
1 lee 0.955 0.343 Normal
2 wind 0.961 0.451 Normal
Activity: Conduct a two-sample t-test
Now we can compare the mean pine needle lengths between windward and leeward sides.
H₀: μ₁ = μ₂ (The mean needle lengths are equal) H₁: μ₁ ≠ μ₂ (The mean needle lengths are different)
Deciding between:
# YOUR TASK: Conduct a two-sample t-test
# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test
# Standard t-test (if variances are equal)
t_test_result <- t.test(len_mm ~ wind, data = pine_data, var.equal = TRUE)
print("Standard two-sample t-test:")[1] "Standard two-sample t-test:"
Two Sample t-test
data: len_mm by wind
t = 8.6792, df = 46, p-value = 3.01e-11
alternative hypothesis: true difference in means between group lee and group wind is not equal to 0
95 percent confidence interval:
4.224437 6.775563
sample estimates:
mean in group lee mean in group wind
20.41667 14.91667
Activity: Interpret the results of the two-sample t-test
What can we conclude about the needle lengths on windward vs. leeward sides?
How to report this result in a scientific paper:
“A two-tailed, two-sample t-test at α=0.05 showed [a significant/no significant] difference in needle length between windward (M = …, SD = …) and leeward (M = …, SD = …) sides of pine trees, t(…) = …, p = ….”
If we collected data in pairs (same tree, different sides), we would use a paired t-test. How would the analysis differ?
Paired t-test formula:
where:
Common assumptions for t-tests:
What can we do if our data violates these assumptions?
Alternatives when assumptions are violated:
In this activity, we’ve:
Key takeaways: